Structured Querying of Annotation-Rich Web Text with Shallow Semantics

نویسندگان

  • Xiaonan Li
  • Chengkai Li
  • Cong Yu
چکیده

Information discovery on the Web has so far been dominated by keyword-based document search. However, recent years have witnessed arising needs from Web users to search for named entities, e.g., finding all Silicon Valley companies. With existing Web search engines, users have to digest returned Web pages by themselves to find the answers. Entity search has been introduced as a solution to this problem. However, existing entity search systems are limited in their capability to address complex information needs that involve multiple entities and their interrelationships. In this report, we introduce a novel entity-centric structured querying mechanism called Shallow Semantic Query (SSQ) to overcome this limitation. We cover two key technical issues with regard to SSQ, ranking and query processing. Comprehensive experiments show that (1) our ranking model beats state-of-the-art entity ranking methods; (2) the proposed query processing algorithm based on our new Entity-Centric Index is more efficient than a baseline extended from existing entity search systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Knowledge in Semi Structured Data Sets with Rich Queries

Semantics can be integrated in to search processing during both document analysis and querying stages. We describe a system that incorporates both, semantic annotations of Wikipedia articles into the search process and allows for rich annotation search, enabling users to formulate queries based on their knowledge about how entities relate to one another while simultaneously retaining the freedo...

متن کامل

Semantator: Semantic annotator for converting biomedical text to linked data

More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-...

متن کامل

Querying Structured XML Document Collections

The number of XML document collections is increasing, and it’s important to effectively query them. Document semantics is in both the text and the structure. In this paper we describe a query interface towards XML document collections. The interface is automatically tailored to the document structure, as described by its XML Schema. External schema annotation in RDF contains information used to...

متن کامل

Self-Annotation for fine-grained geospatial relation extraction

A great deal of information on the Web is represented in both textual and structured form. The structured form is machinereadable and can be used to augment the textual data. We call this augmentation – the annotation of texts with relations that are included in the structured data – self-annotation. In this paper, we introduce self-annotation as a new supervised learning approach for developin...

متن کامل

Aggregative Approximations for Information Retrieval in Semi-Structured Documents

Today’s Web is huge in size, heterogeneous in both contents and data’ structure and is mainly accessed through syntactic and/or statistical criteria. Often, the user is brought to make several searches and to investigate tens of documents to find the information which interests him. The semantic Web was introduced to provide ”meanings” to the information exchanged on the Web and ensure that sof...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010